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Abstract — We devise an analytically simple as well as invertible 
approximate expression, which describes the relation between 
the minimum distance of a binary code and the corresponding 
maximum attainable code-rate. For example, for a rate- (1/4), 
length-256 binary code the best known bounds limit the attain- 
able minimum distance to 65 < d(n — 256, k = 64) < 90, while 
our solution yields d(n = 256, k = 64) = 74.4. The proposed 
formula attains the approximation accuracy within the rounding 
error, and thus satisfies the condition of [d] < d < \d], for 
« 97% of (n, k) scenarios, where the exact value of the minimum 
distance d is known. The results provided may be utilized for the 
analysis and design of efficient communication systems. 



I. Introduction 

One of the fundamental open problems in coding theory is 
constituted by the issue of determining the highest cardinality 
\C\ = 1 attainable by a binary code C of length n, having 
a rate of r = k/n and a minimum distance of d JT|, where 
the minimum distance d is defined as the minimum Hamming 
distance between any two codewords in the codebook C. In 
addition to its theoretical significance, the problem considered 
appears in numerous important applications, including the 
design of efficient coding schemes and their characterization in 
terms of the achievable probability of error. Although the com- 
plete solution of the rate-versus-minimum-distance problem 
does not exist at the time of writing, several theoretical lower 
and upper bounds on the desired relation may be found in the 
literature 0~|-[|5)- l n particular, the tightest known bounding 
characteristics, which originate from a variety of theoretically, 
as well as empirically obtained sources |6), are provided by 
the code-tables compiled by Grassl et. al. in 0. 

Specifically, some of the best known asymptotic (n —> oo) 
as well as finite- n-related lower and upper bounds are summa- 
rized in Table |TJ where we define the binary entropy function 
H(q) = -qlog 2 (q) - (1 - <?)log 2 (l - q) and denote a 
normalized minimum distance as 8 — d/n. More specifically, 
the tightest known asymptotic (n —> oo) lower bound was 
derived by Gilbert |3|, while the corresponding upper bounds 
were devised by Hamming (2) and McEliece et. al. (MRRW) 
J5). The prominent asymptotic lower and upper bounds are 
depicted in Figure Q] Furthermore, the best known finite-n 
bounds are constituted by the Gilbert lower bound, as well 
as the Hamming and Plotkin upper bounds Q. The finite-n 
lower and upper bounds for the specific case of having n = 7 
are depicted in Figure [2] 



Unfortunately, however, most of the available theoretical, 
as well as empirical bounds are notoriously difficult to use in 
practice. On the one hand, as may be inferred from Figures Q] 
and |2] the asymptotic bounds provide little information about 
the desired characteristics of a wide range of finite-n scenarios, 
routinely encountered in practical applications. On the other 
hand, the theoretical bounds corresponding to the finite-n 
cases involve excessively complex numerical computations. 
Against this background, the novel contribution of this paper 
is constituted by the formulation of an analytically simple 
as well as invertible expression r(n, 5), which complies with 
all known theoretical bounds in both finite-n and asymptotic 
(n— >oo) contexts, while accurately approximating the em- 
pirical bounds, and thus providing a practical tool for the 
analysis and design of efficient binary codes. We would like 
to explicitly emphasise the applied nature of this study, which 
is aimed at the development of a methodology for the analysis 
and optimization of communication networks discussed, for 
example, in [8|. 



II. Rate versus minimum distance trade-off 

Firstly, let us consider three special cases, where the exact 
value of the maximum minimum distance d is known. 

a) For a unity -rate binary code of length n = 1,2, ... , we 
have d = 1. 

b) The simplex code for block length of n = 2 k — 1 , k = 
1,2,... exhibits a rate of r=kj (2 k — 1) and a constant 
Hamming distance of d — 2 k ~ 1 between any pair of 
codewords. 

c) For any block length n = 1,2,..., we may consider an 
optimum rate-(r=l/n) n-repetition code conveying a 
single bit of information and exhibiting d = n. 

Secondly, we would like to point out the following list of 
important empirical observations. 

i) As confirmed by Figure [U a simple quadratic function 



r(S) = (26- l) 2 



(1) 



provides an accurate approximation of the empirical 
lower bound Q for the code length of n = 256 and 
rates in excess of 0.2. Notably, (Q~|) satisfies all known 
asymptotic bounds, namely the upper MRRW O and 
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TABLE I 

Known bounds on the maximum code rate achievable for a given (n, d) (finite length case) or 8 (asymptotic case). 



finite n 



asymptotic n — > oo 



Varshamov- 
Gilbert H 

Hamming [TJ 

MRRW (5) 

Plotkin g) 



r > 1 ] 



r < 1 ] 



1 

r < - 



d-l 
z — 

L(d-1)/2J 
i=0 



l-log 2 (2--) 



r > 1 - H(8) 



r < 1 - ff(<5/2) 



tightest known lower bound 



tight upper bound for very high rate 
codes 



r < H(l/2 — y/ 8(1 — S)) tightest known asymptotic upper 
bound for medium and low-rate 
codes 

very tight upper bound for 8 > 1/2 



Hamming |2| bounds, as well as the lower Gilbert- 
Varshamov [3 | bounds summarized in Table [I] over the 
entire range of practically significant code ratefl 

ii) As exemplified by the specific case of n = 7, portrayed 
in Figure[2] the actual achievable values r(6) constitute a 
discrete function, which cannot have an exact monotonic 
analytical description. 

iii) As may be inferred from comparing Figures Q] and [2] 
the asymptotic bounds of Figure Q] provide little useful 
information about the desired characteristics of short 
codes having 1 < n -C 100, and representing a consider- 
able practical importance in the design of, for example, 
interactive, real-time speech and video systems. 

iv) As further suggested by the specific example of having 
n = 7, both the finite-n Gilbert and Hamming bounds 
are relatively loose, while the Plotkin bound is tight for 
5 > \n/2]/n. 

v) The Plotkin upper bound coincides with the actual 
achievable maximum rate r in the special cases of (b) 
and (c) considered above, which further substantiates 
the assumption that the Plotkin bound constitutes the 
tightest possible analytical bound in the 6 > \n/2]/n 
range. 

Taking into consideration observations (i)-(v), we hypothe- 
size a solution exhibiting the following properties: 

• Asymptotic quadratic approximation of (Q~|) 

lim r(n,5) = (26 - if. (2) 

• Unity-rate special case (a) 

r(n,l/n) = l. (3) 

• Plotkin bound H and special cases (b) and (c) 



r ( n, 5 > 



[n/2] 



■[l-log 2 (2-l/<5)]. (4) 



Specifically, we propose a solution in the form of a smooth 
two-segment function r(n, 5) expressed as 

f a{n)S 2 + b(n)S + c(n) if 5 < \n/2 + f (n)] /n 
r{n,6) = | 1 ^ _ _ x ^ otherwise, 

(5) 

1 It should be noted that the expression in {T} does not satisfy the Hamming 
asymptotic upper bound for a hypothetical range of long, very high rate codes 
(n > 200, r > 0.9), which exhibit no practical significance due to their low 
coding gain and excessive decoding complexity. 
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Fig. 1. Rate versus normalized minimum distance for known asymptotic 
bounds. 



where the free parameters a, b, c and £ depend on the code- 
length n and are chosen to ensure that the quadratic constituent 
in Equation ^ complies with the constraints (O and ©, while 
the constraint is automatically obeyed by the corresponding 
logarithmic constituent of (0. 

Furthermore, the requirement of smoothness in the expres- 
sion of (0 imposes the following additional constraints on the 
quadratic constituent in (O: 

* Continuity at the transition point o^ 82 = \n/2 + £] /n 



1 2 



aSl + bS 2 + c = [1 - log 2 (2 - l/S 2 )]/n. (6) 



> Continuity of the first derivative at the transition point 
82, which may be attained by imposing continuity of the 
discrete function of (0 in the next consecutive point £3 = 
(\n/2 + C\ +l)/n, yielding 

r 3 = a5l + b8 3 + c = [1 - log 2 (2 - l/S 3 )]/n. (7) 

By combining the constraints of and © with (0, we arrive 
at a system of three equations, which uniquely determines the 

2 Here and in the following we use a, b, c and § instead of a(n), b(n), c(n) 
and for the sake of brevity. 
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Fig. 2. Rate versus normalized minimum distance for finite length codes. 

values of the parameters a, b and c. Specifically, we have 

r\ = aS\ + bSi + c 

r 2 = aS\ +bS 2 + c (8) 
r 3 = a<5| + bS 3 + c, 

where in addition to the parameters defined in (0 and 0, we 
have n = 1 and Si = 1/ n. The general solution of the system 
of equations in (0 is given by 

_ r 3 [S 2 - Si) + r 2 (Si - S 3 ) + n (5 3 - 5 2 ) 
(n - r 2 ) (n - r 3 ) (r 2 - r 3 ) 
(S 2 - S 3 ) r\ + r\ (Si - fe) + r§ (fr - ft) 

(ri - r 2 ) (ri - r 3 ) (r 2 - r 3 ) 
(r 3 r5i - xir5 3 ) r\ + (r 2 r5 3 - r 3 Si) r 2 + rir 3 (r 3 - n)S 2 



b = 



(ri - r 2 ) (n - r 3 ) (r 2 - r 3 ) 



(9) 



Observe that despite it seemingly complex appearance, Equa- 
tion (0 contains simple closed-form expressions, which may 
be readily calculated for any given value of n. Furthermore, it 
may be readily demonstrated that constraint (0 is satisfied if 



lim £ = oo 



and 



lim 

n— ¥oo 



n/2 + £ 



lim 

n— too fi 



o. 



(10) 



(11) 



Our analysis has shown that any sensible choice of the 
function as monotonically increasing and satisfying the 
conditions ( TTOb and ( fTTT > as well as < £(1) < 1 yields similar 
results. Specifically, in this study we assume having 



£(n)=log 2 (n)/2. 



(12) 



Some examples of values of the parameters a, b, c and £ 
calculated using Equations (0 and ( flZb for various code- 
lengths n are summarized in Table [II] 

The resultant expression r(n, 5) of Equation (0 is compared 
to the available theoretical and empirical bounds in Figures Q] 



TABLE II 

Approximation parameters a, b, c and £ calculated using ® 

AND ( fT2l FOR SOME VALUES OF THE CODE-LENGTH n. 



n 


a 


b 


c 


£ 


4 


0.83 


-2.04 


1.46 


1.0 


8 


1.23 


-2.36 


1.28 


1.5 


16 


1.72 


-2.70 


1.16 


2.0 


32 


2.17 


-2.99 


1.09 


2.5 


64 


2.57 


-3.22 


1.05 


3.0 


128 


2.92 


-3.42 


1.03 


3.5 


256 


3.11 


-3.53 


1.01 


1.0 



100 



Hamming 
d(n,r) of Eq(12) 
GrassI 
Gilbert 
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Fig. 3. Minimum distance versus code-length for binary codes of rates 
r = 1/3 and 2/3, as calculated using the theoretical Gilbert and Hamming 
bounds 0, (3), the empirical GrassI bounds 0, as well as the proposed 
expression of 4131 . 



and for the asymptotic case (n — ► oo), where we have the 
original quadratic expression r(n, S) —> (25 — l) 2 , and the 
finite-n cases of n = 3, 7, 16, 128, respectively. 

Expression may be deemed analytically simple, since it 
has a closed form and is composed of elementary functions. 
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Fig. 4. Histogram of the approximation error e[n, k] = d(n,k) — 
^GrassI [ n > k] based on the 3856 scenarios (16 < n < 256, {1 < fc < 
8, (n — 7) < k < n}) for which the exact maximum minimum distance 
dQ rass i [n, fc] is known 0. The approximation accuracy of |e[n, k\\ < 1 
was achieved in 97% of the cases considered, while the accuracy of 
1 < |e[n, k] | < 2 was achieved in the remaining 3% of the cases. 
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Moreover, Q is readily invertible, yielding 

-b - yjb" 1 - 4a (c - r) 



<J(n, r) 



2o 



Orn 



if r > ilog 2 (n + l) 
otherwise, 



(13) 



where the coefficients a, b and c may be readily calculated 
using ©-(O. In the asymptotic case of having n —> oo, which 
in practice may be safely employed for all scenarios having 
n ^> 100, we may simply use the inverse of (fl}, yielding 
*(r) = (1 + y/r)/2. 

Figure [3] portrays the comparison between the formula of 
Equation ( fT3l and the best available theoretical and empirical 
upper and lower bounds for the specific cases of rate- (1/3) 
and rate-(2/3) binary codes. Observe, that the Hamming and 
Gilbert theoretical bounds imply a considerable ambiguity in 
terms of the attainable minimum distance d(n, r). Further- 
more, the devised expression of ([T3l provides an accurate 
approximation of the available empirical Grassl bounds Q 
for both r = 1/3 and 2/3 cases. 

The approximation accuracy of the proposed formula of 
(fT3l was further tested using the 3856 scenarios (16 < 
n < 256, {1 < k < 8, (n - 7) < k < n}) for which 
the exact maximum minimum distance dGrassi[»i, k] is known 
from [7|. The histogram of the resultant approximation error 
e[n, k] = nS(n,k) — dG rass i[n,k] calculated using (U~3b is 
depicted in Figure |4] Specifically, in approximately 97% of 
the cases considered, the accuracy of \e[n, k]\ < 1 was 
achieved, thus suggesting that the desired value dGrassi[n>,k] 
was the nearest integer higher or lower than the real number 
d(n, k) provided by the approximation formula. Furthermore, 
the approximation accuracy of 1 < |e(n, k)\ < 2 was achieved 
in the remaining 3% of the cases. 
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III. Conclusion 

We formulated an analytically simple as well as invertible 
expression r(n, 5), which approximates the optimum trade-off 
between the maximum rate and the corresponding maximum 
minimum distance attainable by binary codes of length n. 
The resultant closed-form analytical expression accurately 
approximates the best available empirical bounds and complies 
with all known theoretical bounds in both finite-?i as well as 
in asymptotic (n — > oo) contexts. 

For example, for a rate-(l/4), length-256 binary code the 
best known bounds limit the attainable minimum distance to 
65 < d(n = 256, k — 64) < 90, while our solution yields 
d(n = 256, k = 64) = 74.4. The proposed formula attains 
the approximation accuracy within the rounding error, and 
thus satisfies the condition of [d(n,k)\ < dQ IOSS \[n, k] < 
\d(n,k)], for ps 97% of (n,k) scenarios, where the ex- 
act value of the maximum minimum distance dGross\[n, k] 
is known. Furthermore, the condition of [d(n, k) — lj < 
^Grossi[^j k] < \d(n, k) + 1] is satisfied in 100% of the cases 
considered. Correspondingly, the proposed method provides a 
practical tool for the design and characterization of efficient 
communication systems. 



